AITopics | model explanation

Collaborating Authors

model explanation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0b9536e186a77feff516893a5f393f7a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 21:55:14 GMT

explanation, explanation method, use case, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Human Computer Interaction (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Learning to Generate Inversion-Resistant Model Explanations

Neural Information Processing SystemsFeb-9-2026, 17:56:40 GMT

Figure 1 compares the result of MI attacks using model predictions alone (PredMI) and explanation-aware MI attacks using model predictions and explanations together (ExpMI).

explanation, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.94)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

0d770c496aa3da6d2c3f2bd19e7b9d6b-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 11:24:49 GMT

explainability, explanation, shapley value, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Law (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Game Theory (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Learning to Generate Inversion-Resistant Model Explanations

Neural Information Processing SystemsDec-24-2025, 10:38:59 GMT

The wide adoption of deep neural networks (DNNs) in mission-critical applications has spurred the need for interpretable models that provide explanations of the model's decisions. Unfortunately, previous studies have demonstrated that model explanations facilitate information leakage, rendering DNN models vulnerable to model inversion attacks. These attacks enable the adversary to reconstruct original images based on model explanations, thus leaking privacy-sensitive features. To this end, we present Generative Noise Injector for Model Explanations (GNIME), a novel defense framework that perturbs model explanations to minimize the risk of model inversion attacks while preserving the interpretabilities of the generated explanations. Specifically, we formulate the defense training as a two-player minimax game between the inversion attack network on the one hand, which aims to invert model explanations, and the noise generator network on the other, which aims to inject perturbations to tamper with model inversion attacks. We demonstrate that GNIME significantly decreases the information leakage in model explanations, decreasing transferable classification accuracy in facial recognition models by up to 84.8% while preserving the original functionality of model explanations.

explanation, generate inversion-resistant model explanation, model explanation, (6 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

Neural Information Processing SystemsDec-24-2025, 10:36:35 GMT

The increasing size and complexity of modern ML systems has improved their predictive capabilities but made their behavior harder to explain. Many techniques for model explanation have been developed in response, but we lack clear criteria for assessing these techniques. In this paper, we cast model explanation as the causal inference problem of estimating causal effects of real-world concepts on the output behavior of ML models given actual input data. We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP). CEBaB consists of short restaurant reviews with human-generated counterfactual reviews in which an aspect (food, noise, ambiance, service) of the dining experience was modified. Original and counterfactual reviews are annotated with multiply-validated sentiment ratings at the aspect-level and review-level. The rich structure of CEBaB allows us to go beyond input features to study the effects of abstract, real-world concepts on model behavior. We use CEBaB to compare the quality of a range of concept-based explanation methods covering different assumptions and conceptions of the problem, and we seek to establish natural metrics for comparative assessments of these methods.

causal effect, cebab, real-world concept, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Debugging Tests for Model Explanations

Neural Information Processing SystemsDec-23-2025, 17:36:27 GMT

We investigate whether post-hoc model explanations are effective for diagnosing model errors--model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize \textit{bugs}, based on their source, into: ~\textit{data, model, and test-time} contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.

contamination, debugging test, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

SX-GeoTree: Self-eXplaining Geospatial Regression Tree Incorporating the Spatial Similarity of Feature Attributions

Kang, Chaogui, Luo, Lijian, Guan, Qingfeng, Liu, Yu

arXiv.org Machine LearningNov-26-2025

Decision trees remain central for tabular prediction but struggle with (i) capturing spatial dependence and (ii) producing locally stable (robust) explanations. We present SX-GeoTree, a self-explaining geospatial regression tree that integrates three coupled objectives during recursive splitting: impurity reduction (MSE), spatial residual control (global Moran's I), and explanation robustness via modularity maximization on a consensus similarity network formed from (a) geographically weighted regression (GWR) coefficient distances (stimulus-response similarity) and (b) SHAP attribution distances (explanatory similarity). We recast local Lipschitz continuity of feature attributions as a network community preservation problem, enabling scalable enforcement of spatially coherent explanations without per-sample neighborhood searches. Experiments on two exemplar tasks (county-level GDP in Fujian, n=83; point-wise housing prices in Seattle, n=21,613) show SX-GeoTree maintains competitive predictive accuracy (within 0.01 $R^{2}$ of decision trees) while improving residual spatial evenness and doubling attribution consensus (modularity: Fujian 0.19 vs 0.09; Seattle 0.10 vs 0.05). Ablation confirms Moran's I and modularity terms are complementary; removing either degrades both spatial residual structure and explanation stability. The framework demonstrates how spatial similarity - extended beyond geometric proximity through GWR-derived local relationships - can be embedded in interpretable models, advancing trustworthy geospatial machine learning and offering a transferable template for domain-aware explainability.

attribution, explanation, prediction, (16 more...)

arXiv.org Machine Learning

2511.19845

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Fujian Province (0.04)
North America > United States > Oregon (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

SHAP-Based Supervised Clustering for Sample Classification and the Generalized Waterfall Plot

Lin, Justin, Fukuyama, Julia

arXiv.org Machine LearningOct-13-2025

In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex input-output relationships. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) analysis is an explainable AI method growing in popularity for its ability to explain model predictions in terms of the original features. For each sample and feature in the data set, we associate a SHAP value that quantifies the contribution of that feature to the prediction of that sample. Clustering these SHAP values can provide insight into the data by grouping samples that not only received the same prediction, but received the same prediction for similar reasons. In doing so, we map the various pathways through which distinct samples arrive at the same prediction. To showcase this methodology, we present a simulated experiment in addition to a case study in Alzheimer's disease using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We also present a novel generalization of the waterfall plot for multi-classification.

artificial intelligence, machine learning, shap value, (17 more...)

arXiv.org Machine Learning

2510.08737

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback